This document is structured as follows:
- R2 of simulation based on real data
- R2 performances
- Top 25% loadings
- Approximation of sample cov
Research question to answer
- When does O2PLS overfit
- Does it affect interpretation
- Where does overfitting occur
Correct top 25% in very high D
- We may be bad in predicting Y with 95% noise
- How do the order of the estimated loadings compare?
- The proportion of true top 25% in the estimated top 25% are shown in a boxplot
- Only \(p=200\) is considered, \(p=20\) is similar
p1 <- ggplot(data=topss%>%filter(p=="200"), aes(x=method, y=sqrt(value))) +
geom_boxplot(aes(col = method)) +
geom_hline(yintercept = 1, col = "gray", lty=2) +
facet_grid(N*noise ~ nr_comp*p, scales = 'free') +
theme_bw() + scale_x_discrete("Method") + scale_y_continuous("TPR") +
theme(axis.title = element_text(face="bold", size=16))
ggplotly(p1)
- And now the difference of PO2PLS and O2PLS within runs
- With both \(p=20\) and \(p=200\)
topssdiff <- topss %>%
filter(method=="po2m") %>%
select(-method) %>%
mutate(value.po2m=value) %>%
select(-value) %>%
bind_cols(topss %>%
filter(method=="o2m") %>%
select(-method) %>%
mutate(value.o2m=value) %>%
select(value.o2m))
topssdiff %<>% mutate(dif = value.po2m - value.o2m)
p2 <- topssdiff %>% ggplot(aes(x=nr_comp, y=dif)) +
geom_boxplot(aes(col=nr_comp)) +
facet_grid(N*noise ~ p, scales = 'free') +
geom_hline(yintercept = 0, lty=2, col="gray")
ggplotly(p2)
Conclusions top 25%
- Difference most affected by noise level and number of components (in favor of PO2PLS)
- In general, PO2PLS has higher TPR in most runs
Train and test errors of covariance blocks Sx, Sxy, Sy
- Finally, we take a look at the performance of the joint part in reconstructing the sample cov matrix
- For this, we evaluate the MSE of the model-based estimated cov matrix and the sample cov matrix
- For the test error values, the cov matrix of 1e4 samples was used
- main conclusion: PLS and O2PLS have similar fit regarding Sxy, PPLS does a better job in fitting x, PO2PLS is in between PPLS and (O2)PLS